1) Exploratory Data Analysis (EDA)

• Load the dataset and perform initial data exploration. • Summarize key statistics, including measures of central tendency and dispersion. • Visualize data distributions (e.g., histograms) for relevant features. • Identify any missing values and suggest appropriate strategies for handling them. • Explore relationships between variables (e.g., correlation matrix, scatter plots).

1.1) Import Libraries

1.2) Import Dataset

1.2) Basic Analysis

Data Preprocessing: • Prepare the data for modeling by addressing missing values and data quality issues.

Explore relationships between variables (e.g., correlation matrix, scatter plots).

Using correlation matrix check what features/sympotoms are related to each other 3.1) Correlation matrix with heatmap-correlation indicates how the features are related to each other or to the target variable.The correlation may be positive(increase in one value of the feature increases the value of the target variable) or negative (increase in one value of the feature decreases the value of the target variable). Heatmap makes it easy to classify the features are most relevant to the target variable.

As from the above corelation matrix we can conclude that the target variable that the person has a cardiovasularity shows a strong negative correlation with ('exang','oldpeak','ca','thal') and weak negative correlation with ('chol','fbs'.'trestbps').

Split the data into training and testing sets for model evaluation

4.1) Gathering the columns and splitting of the data into train and test set

MODEL 1 (LOGISTIC REGRESSION)

Decision Treee

Support Vector Machine

Naive Bayes

The classification report of the model shows that 91% prediction of absence of heart disease was predicted correct and 83% of presence of heart disease was predicted correct

The confusion matrix true positive value is 20 and true negative was 27. And the false positive came out to be 12 and false negative is 2.

As we get the prediction of the data

splitting the train and test data

Feature scalling and transformation

Regression Task (Customer Lifetime Value Prediction):

• Build a regression model to prediction • Select an appropriate regression algorithm (e.g., linear regression, ridge regression, or any of your choice). • Train the model on the training set and evaluate its performance on the testing set. • Assess model performance using relevant regression metrics (e.g., RMSE, R-squared). Visualize the model's predictions and actual values. • Provide insights into the factors that contribute most to the model.

Model (Linear Regression)

# Evaluation of the model

RMSE

Feature Engineering:

• Create additional features that might improve the classification and regression models. Explain the rationale behind feature engineering choices.

4.1) Gathering the columns and splitting of the data into train and test set

use feature engineering technique/feature reduction technique to reduce the size of the data for efficient modelling.